THE POLYNOMIAL MULTIDIMENSIONAL SZEMEREDI THEOREM 

ALONG SHIFTED PRIMES 



NIKOS FRANTZIKINAKIS , BERNARD HOST, AND BRYNA KRA 

Abstract. If qi, . . . , q m : Z — >• Z are polynomials with zero constant terms and E C 
if- has positive upper Banach density, then we show that the set E n (E — qi(p — 1)) n 
. . . H (£7 — q-mip— 1)) is nonempty for some prime p. We also prove mean convergence for 
the associated averages along the prime numbers, conditional to analogous convergence 
results along the full integers. This generalizes earlier results of the authors, of Wooley 
and Ziegler, and of Bergelson, Leibman and Ziegler. 



1. Introduction 

1.1. Background and new results. Recent advances in ergodic theory and number 
theory have lead to numerous results on patterns in subsets of the integers with positive 
upper density, with descriptions of possible restrictions on differences between successive 
terms. In this vein, we show that the parameters in the polynomial multidimensional 
Szemeredi Theorem of Bergelson and Leibman [I] can be restricted to the shifted primes. 

Let P denote the set of prime numbers and define the upper Banach density d*(E) 
of a set E C Z f as d*{E) = Hmswp^^^ ^jp , where the limsup is taken over all 
parallelepipeds I C ll whose side lengths tend to infinity. 

Theorem 1.1. Let E,m G N, q\, . . . ,q m : Z — > Z be polynomials with <ft(0) = for 
i = 1, . . . ,m, and let E C Z with upper Banach density d*(E) > 0. Then the set of 
integers n such that 

d*(E n (E - q x {n)) D . . . D (E - q m (n))) > 
has nonempty intersection with P — 1 and P + 1 . 

In fact, our argument shows this intersection has positive relative density in the shifted 
primes. 

The first result in this direction was due to Sarkozy [18J, who used analytic number 
theory to show that the difference set E — E for a set E of positive upper Banach 
density contains a shifted prime p — 1 for some p G P (and similarly, as for all the results 
stated here, a shifted prime of the form p+ 1). In [7], relying on strong uniformity results 
of [11] related to the primes combined with Roth's theorem on arithmetic progressions, we 
took a first step towards a multiple version, showing that such E contains an arithmetic 
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progression of length 3 whose common difference is a shifted prime. This was generalized 
in two ways. First, Wooley and Ziegler [21] proved Theorem 11.11 for £ — 1, relying on a 
deep ergodic structure theorem and milder number theoretic input than used in [7j. More 
recently, Bergelson, Leibman, and Ziegler [5], proved Theorem II. II for linear polynomials 
q\, . . . , q m , by combining the ergodic results on IP-recurrence of [9] and the uniformity 
results related to the primes of [11], [12], and [13] (their proof also gives the partition 
version of our main result in full generality) . Theorem 11.11 generalizes the results of [21] 
and [5], and is in the spirit of [7], with the main ingredients being the number theoretic 
uniformity results of [11], [12], and [13] and a uniform version of the polynomial Szemeredi 
theorem [1], [3]. 

By the Furstenberg Correspondence Principle (see Section I2TT1 below) . Theorem 11.11 is 
equivalent to an ergodic version and this is the version that we prove. 

Theorem 1.2. Let £ G N, (X, X, /i) be a probability space, and let T 1; . . . , Tf. X — > X 

be commuting invertible measure preserving transformations. Let m G N, q^j : Z — > Z be 
polynomials with qij(0) = for i = 1, . . . , £ and j — 1, . . . , m. Then for any A G X with 
n(A) > 0, the set of integers n such that 

i i 

fi(An ( l[T^ l{n) )A n . . . n ( \[T^ {n) )A) > o 

i=l i=l 

has nonempty intersection with P — 1 and P + 1. 

We also prove mean convergence results for the corresponding multiple ergodic aver- 
ages over the primes, conditional on the convergence of the corresponding averages over 
the full set of natural numbers (in some cases these results are not known). 

Theorem 1.3. Let £,m,£ N, (X,X,fi) be a probability space, Ti, . . . ,7}: X — > X be 
commuting invertible measure preserving transformations, and fi, . . . , f m G L°°(//) be 
functions. For i = 1, . . . , £ and j = 1, . . . , m, let q^j : Z — > Z be polynomials. Suppose 
that the averages 

(1) -1- j2 h((f[T? Aan+ >) ■ . . . ■ f m ((f[Tr {an+ >), 

^ ' pePn[i,AT] i=i i=i 
converge in L 2 (fi) as N — >■ oo for all integers a,b > 1. Then the averages 

(2) E ^((^^^•••••^((n^^)- 

^ ' p ePn[i,JV] i=i i=i 

where n(N) denotes the number of primes up to N, also converge in L 2 (fi) as N — )■ oo. 

Convergence of ([2]) when £ — m — 1 was proved by Wierdl [20J (more generally he 
showed pointwise convergence, an issue that we do not address here). When all the 
transformations are equal and one restricts to linear polynomials, we proved convergence 
of ([2]) in [7], but for m > 3 this was conditional upon the results of [12] and [13] that 
were subsequently proven. In the case where all the transformations are equal, conver- 
gence of (T2]) was proved by Wooley and Ziegler in [21]. Combined with the convergence 

2 



results of [H] and [T7], Theorem 11.31 recovers the convergence results of [21] . Using the 
convergence results of [T5], we obtain the new result of mean convergence for the linear 

averages 

£ h(Tfx).....f e (T!x), 

1 ' P ePn[i,JV] 

and combined with the results of [6], we have mean convergence for other new cases, for 
example the averages 

£ h(Tlx)-f 2 (T(x)-...-h(T?x). 

^ ' pePn[i,JV] 

Combining with the convergence results of pQ and [2] , we have mean convergence of the 
averages 

£ A(Tf 2 a;). /2 (Tf 2 T|x). 

1.2. Strategy of the proof. We prove Theorems 11.21 and 11.31 by reducing the problem 
to a deep result on the uniformity of the modified von Mangoldt function (Theorem 12.21 
below). The main idea is to compare the multiple ergodic averages along the primes 
with the corresponding ones along the natural numbers, and show that the difference 
between the two converges to zero in mean. Some variation of this idea holds and is given 
in Proposition 13.61 The proof of this follows by successive applications of the van der 
Corput lemma and a straightforward PET (polynomial exhaustion technique) induction 
argument, reducing the problem to the aforementioned uniformity result. Given the com- 
parison result of Proposition 13.61 the proof of Theorem 11.31 follows in a straightforward 
manner from known convergence results, and the proof of Theorem 11.21 follows similarly, 
with the additional input of a uniform version of the polynomial Szemeredi theorem. 

1.3. Further directions. Combining the method of this paper with the multiple re- 
currence result and methods of [TB], one can show that Theorem 11.21 holds under the 
relaxed assumption that the transformations T\, . . . ,T# generate a nilpotent group (and 
thus obtain further combinatorial implications, as in [16]). Likewise the obvious ex- 
tension of Theorem 11.31 to the nilpotent case holds. In both cases, the necessary new 
ingredient is an extension of the uniformity estimate of Lemma 13.51 to the case that the 
transformations Ti, . . . ,Tg generate a nilpotent group, which can be proved using the 
PET induction scheme in [16]. We do not carry this out here. 

A more challenging problem is the extensions of Theorems 11.21 and 11.31 to sequences 
involving fractional powers. For example, one could hope to show that for any positive 
real numbers a and b, any E C Z with d*(E) > contains patterns of the form m,m + 
\p a ], m + 2[p a ], or patterns of the form m,m+ [p a ], m + [p b ] for some m G N and p G P. If 
one is to use the methods of this paper, one would need to prove an appropriate variant 
of Lemma [3. 5 [ a seemingly nontrivial result. 

Lastly, we mention that for two or more transformations, even the simplest pointwise 
variants of the mean convergence results we have established remain open. For example, 
it is not known if for a probability space (X,X,fi), measure preserving transformation 
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T: X -> X, and functions / x , / a G L°°(aO, the averages ^ E pG pn[i,JV] fi( TPx )- f2(T 2p x), 
or the averages ^ pePn [ 1 N j fi(T p x) ■ f2(T p2 x), converge pointwise as N — > oo. As 
a first step one could try to prove a pointwise variant of Theorem 11.31 by using the 
method of this paper. The missing ingredient is an appropriate quantitative variant of 
Theorem 12.21 

1.4. General conventions and notation. We denote the positive integers by N = 
{1, 2, . . .} and write Z^r = Z/7VZ; when needed, the set Zjv is identified with N fl [1, N}. 
If / is a measurable function on a measure space X with transformation T: X — » X, 
we write Tf — f o T. If S is a finite set and a: S — > C, then we write E ng5 a(n) = 
r|r X!nes a ( n )- We use ^ ne s y m bol <C when some expression is majorized by a constant 
multiple of some other expression. If this constant depends on variables ki, . . . , kg, we 
write <^.ki,...,kf We use ojv(1) to denote a quantity that converges to zero when — > oo 
and all other parameters are fixed. 

2. Background 

2.1. Furstenberg correspondence principle. We state a modification of the corre- 
spondence principle of Furstenberg (the formulation given is similar to the one in [1]): 

Furstenberg Correspondence Principle (|8J). Let I e N and E C Z'. There exist 
a probability space (X,X,fi), commuting invertible measure preserving transformations 
Ti, . . . , Tf. X — > X, and set A e X with /jl(A) = d*(E), such that 

£ £ 

<r(En(E-n 1 )n...n(E- n e )) > ^{A n ( JJ t^'A) n . . . n (JJ T^ m A)) 

i=i i=i 

for all m G N and tij = {n\j, . . . , nij) G Z £ for j = 1, . . . , m. 

In particular, this correspondence shows that Theorem 11.11 follows from Theorem 11.21 

2.2. Averages along the primes and weighted averages. Let A: N — > R denote 
the von Mangoldt function, taking the value logp on a prime p and its powers and 
elsewhere, and let 

A'( n ) = i P ( n ) . A(n) 

for n G N. Throughout, the roles of A and A' are interchangeable, and all the results 
can be proven for either function (as the contribution from prime powers greater than 1 
is negligible in our averages); in this article the function A' appears more naturally and 
so we prove the results for this version. 

The following lemma is classical (for a proof, see for example [7]) and allows us to 
relate averages over the primes with weighted averages over the integers: 

Lemma 2.1. If a: N — > C is bounded, then 

1 1 N 

^ ' p€F,p<N n=l 



In particular, the average in fl2]) is asymptotically equal to the weighted average over 
the natural numbers: 

N 



-jr E A 'H • fi((U T t An) » ■■■■ f m (([[T?^)x). 

n=l i=l i=l 

2.3. Gowers norms. If a: Z^r — > C, we inductively define: 
and 



I a llc7i(z w ) - | E neZjv°( n ) 



_l|2 d 

" \u d (z N ) 



l/2 d+1 



W a Wu d+1 (z N ) ~ (^hez N IK ■ a\ 
where ah(n) = a(n + h). Gowers [TU] showed that for d > 2 this defines a norm on Zjv- 

2.4. Uniformity of the modified von Mangoldt function. For w > 2 let 

W = JJ p 

p£F,p<w 

denote the product of the primes bounded by w. For r e N let 

where denotes the Euler function. 

The next result is key for our study. It was obtained in [11] (Theorem 7.2), conditional 
upon results on the Mobius function later obtained in [12] (Theorem 1.1) and the inverse 
conjecture for the Gowers norms (recently proved in [T3]): 

Theorem 2.2 (Green and Tao ([IT], [12]), Green, Tao, and Ziegler [33]). With the 
previous notation, for every d 6 N, £/ie maximum, taken over those r between 1 and 
satisfying (r, W 7 ) = 1, of 

\\( A ' W ,r-l)-MhN]\\ Ud(ZdN) 

converges to as N oo and then w — > oo. 

Note that in [TT] (Theorem 7.2), the result is stated with w being a specific slowly 
growing function of N, but the authors also note any sufficiently slowly growing function 
of iV works too, and this implies our version. Furthermore, in [11] the theorems are 
stated without the indicator function 1[i,jv], but the results of [TT], [12], and [13], also 
imply this version. 

3. Comparing averages 

3.1. PET (polynomial exhaustion technique) induction. We describe the induc- 
tive scheme from [3] and follow the notation and implementation used in [6]. Let 
f,m6N. Given I ordered families of polynomials 

Qi = (<Zi,ij • • • > Qi,m), ■ ■ ■ ,Qi — (qe,i, • ■ • j %,m)> 
we define an ordered family (Qi, . . . , Qe) of m polynomial t-tuples by 

(Qu ■ ■ ■ , Qi) = ((?i,n • • • > ■ ■ ■ ' (<?l/m, • • • , ©,m)) • 
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This gives a concise way of recording the polynomial iterates that appear in the average 
of 

j^j,q\,i(n) _ _ _ T qe,1<yn ^x) ■ ■ f (T qi ' rn ^ > ■ ■ ■ T qi ' m ^x) 

The maximum of the degrees of the polynomials in the families Qi, ■ ■ ■ , Qe is called the 
degree of the family (Q±, . . . , Qe). 

Fix an integer s > 1 and consider families of degree < s. For % — !,...,£, define Q- to 
be the (possibly empty) set given by: 

Q! i = {nonconstant q^j G Qi : qvj is constant for % < i}. 

Two polynomials are said to be equivalent if they have the same degree and the same 
leading coefficient. For i = 1, . . . , £ and j = 1, . . . , s, we let w it j denote the number of 
distinct non-equivalent classes of polynomials of degree j in the family Q[. 

Define the (matrix) type of the family (Qi, . . . , Qe) to be the matrix 

/tu M ... tuxA 

W 2 , s ■ ■ ■ W 2 ,l 

\we, s ■ ■ ■ we,i/ 

A matrix is said to be of matrix type zero if all the w it j are zero, and this happens 
exactly when all the polynomials are constant. 

We order the types lexicographically: given two £ x s matrices W = {wij) and W = 
{w'ij), we say that W is bigger than W, and write W > W, if wi t d > w f ld , or wi t d = w[ d 
and w^d-i > uj[ d _ 1 , . . ., or = w' l i for % = 1, . . . , d and w 2t d > w 2di an( i so on - We 
have: 

Lemma 3.1. Every decreasing sequence of types of families of £-tuples of polynomials is 
eventually stationary. 

Thus applying some operation that reduces the type, after finitely many repetitions, 
the procedure terminates. Such an operation is described in the next subsection. 

3.2. The van der Corput operation. Given a family Q = . . . , g m ), q e Z[t], and 
h G N, we define the families ShQ and Q — q as follows: 

S h Q= {S h qi,...,S h q m ) and Q-q= (q ± q m - q) , 

where (S'/ l g)(n) = q{n + h). 

Given a family of ^-tuples of polynomials (Q±, . . . , Qe), an £-tuple (qi, . . . , qe) G (Qi, ■ ■ ■ Qe), 
and h G N, define the operation 

(<?i, ...,q e ,h) -vdC(Qi, . . . , Qe) = (Q ljh , . . . Q tjh ), 

where 

Qi,h = {ShQi -qi,Qi- qi), 

fori = !,...,£ (note that this Qi t h is defined to be the concatenation of two tuples of 
polynomials). 
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Starting with a family (Qi, . . . , Q e ), we successively apply appropriate van der Corput 
operations to arrive at constant families of ^-tuples of polynomials. This is achieved 
using: 

Lemma 3.2 (Bergelson and Leibman [4J). Let (Qi, ■ ■ ■ , Qg) be a family of i-tuples of 
polynomials with nonzero matrix type. Then there exists {qi, ■ ■ ■ , qi) £ (Qi, • • • , Qi) such 
that for every h E N, the family (qi, . . . , qg, h) -vdC(<2i, ■ ■ ■ , Qi) has strictly smaller type 
than (Qi, ...,Qi). 

While this lemma is usually stated to hold for sufficiently large h, this is only in order 
to maintain extra properties of the polynomial family (such as being essentially distinct), 
and we do not need these properties here. Thus we are able to phrase this in the slightly 
stronger, and easier to use for our purposes, setting of all h EN. 

Assuming Lemma 13. 2\ the proof of the next result is standard: 

Lemma 3.3. Let (Qi, . . . , Qi) be a family of m polynomial l-tuples with nonzero matrix 
type. Suppose that we successively apply the (qi, . . . , qg, h) -vdC operation for appropriate 
choices of qi, . . . ,qg E Z[i] and h E N, as described in the previous lemma, each time 
obtaining a family of i-tuples of polynomials with strictly smaller type. Then after a 
finite number of operations, depending only on i, m, and the maximum degree of the 
polynomials (but not on the successive choices of h), we obtain families of i-tuples of 
polynomials of degree 0. 

3.3. Controlling averages. We state a variation of a classical elementary estimate of 
van der Corput. 



Lemma 3.4. Let N EN and v(l), 
product (•, •) and norm ||-||. Then 



,v(N) be elements of a Hilbert space H, with inner 



i N 

Ev(n) 



n=l 



^ N 1^1 N ~ k 

« m E iK n ) ii 2 + n T\ n E ^ n + v> 



n=l 



N ^\N 

h=l n=l 



For the case H = R and ||-|| = | • |, the proof is found, for example in [TJ]. The proof 
in the general case is essentially identical. 

Before stating the main lemma ( 13. 5 p used to control averages, we give a simple case 
that illustrates the technique: 

Example. Let a: N — > C be a sequence that satisfies a(n)/n 1 / 4 — > 0. Let (X, X,fi) be a 
probability space, T: X — > X be a measure preserving transformation, and / E L°°(/x) 
be a function bounded by 1. Then we have that 



(3) 



1 N 

^E^)-^V 



n=l 



< a • 1 



l 1 ^\\u 3 (z aN ) + °x( 1 ) 



To prove this, we apply van der Corput (Lemma 13.41 for v(n) = a(n) ■ T" 2 f) and the 
Cauchy-Schwarz Inequality and we have 



1 N 



n=l 



N N-hi 



1 N 1 N-hi 1 N 



/ii=l n=l 



AT2 

L 2 (/i) n=l 



(note that H/H^a^ < 1)- By assumption, the second term is ojv(1) and we are left with 
estimating the first term. For hi = 1, . . . , N, rewriting the interior sum as 

1 N 

- Mhmin + hi) ■ a(n + hi) ■ a(n) ■ T 2nh ^ h *f, 

n=l 



and applying van der Corput and Cauchy-Schwarz once more, we have that 

2 



N-h! 

a{n + h 1 )-a{n)-T^ +h *f 



N 

71=1 

N N-hi-hn ^ V 

— 7 — a(n) -a(n + /ii) -a(n+/i 2 ) -a(n+/ii+/t 2 ) +tt^ / \a(n + hi) ■ 

A z — ' I A/ z — ' JS Z z — ' 



AT ^ I 

fe 2 =i 



an 



71=1 



n=l 



Again, by assumption, the average over hi G {1, . . . , A^} of the second term is ojv(l). By 
further applications of Cauchy-Schwarz, we have that the eighth power of the L 2 (/i)-norm 
of the original average is bounded by a constant multiple of 



N-ht-h.2 



( 4 ) | a? «(^) • o(^ + ^i) • a(n + /i 2 ) • a(n + ai + a 2 ) +Ojv(1). 



l</ll,/l2<JV 



n=l 



On the other hand, letting a^{n) = a(n) ■ ln^n), for n = 1, . . . , 3 AT, and thinking of 
oat as a function C, we have that 

||aiv||^ 3(Z3Jv) = E folift2Z3JV |E n6Z3N aiv(n) • ajv(n + ai) • a N (n + h 2 ) ■ a N (n + hi + h 2 )\ 2 . 

(The sums n + hi, n + h 2 , and n + h\ + h 2 are taken modulo 3A^, and we make the 
somewhat less conventional identification of with [1, . . . , 3N].) This is greater than 
or equal to (eliminating values with N < h\,h 2 < 3N) 



— ^2 |E ne z 3iV a A r(n) • a N (n + hi) ■ a N (n + h 2 ) ■ a N (n + h x + h 



9N 2 



2)\ 2 , 



l<hiM<N 



where we maintain the same convention on sums. Since in this expression we have 
1 < hi, h 2 < N and ajv(n) is zero for n e {A" + 1, . . . , 3A/"}, we have that all hi, h 2 , n 



that make a nonzero contribution to this last average satisfy 1 < n + h\ + h 2 < 3N. In 
particular, there are no circular effects and the last expression is equal to 



I 

9iV 2 



^ 3N 

E E Oiv(n) • a N (n + /ii) • aAr(n + h 2 ) ■ a N (n + hi + h 2 ) 



l<h lt h 2 <N 



n=l 



N-hi-hn 



81N 2 



E — E a{n) ■ a(n + hi) ■ a(n + h 2 ) ■ a(n + hi + h 2 ) 



l<hi,h 2 <N 



n=l 



where the sums n + hi, n + h 2 , and n + hi + h 2 are taken in N, without reduction modulo 
3iV. But this expression is exactly 1/81 of the average in (j3j). Combining these estimates, 
we have that the eighth power of the L 2 (/z)-norm of the original averages is bounded by 



a constant times llajvl 



U 3 (Z 3N ) 



plus an ojy(l) term. Thus we have estimate 



We now turn to the general case: 

Lemma 3.5. Let £, m G N, (X, X, fi) be a probability space, T\, . . . , Tg : X — > X be com- 
muting invertible measure preserving transformations, fi, . . . , f m G be functions 
bounded by 1, and qij: Z — > 7L, i G {!,...,£}, j G {l,...,m} ; be polynomials. Let 
a: N — > C be a sequence of complex numbers satisfying a(n)/n c — > for every c > 0. 
Then there exists d6N, depending only on the maximum degree of the polynomials q^j 
and the integers i and m, such that 

N £ I 

^Eo(n)-(n^ ,i(B) )/i-----(ii^ ,m<B v B 



n=l 



1=1 



1=1 



+ ojv(1)- 



L2( M ) 



Furthermore, the implicit constant is independent of the sequence (a(n)) ne ^, and the 
ojv(1) term depends only the integer d and on the sequence (a(n)) ne n. 

Proof. For i = 1,...,£, let Qj = (q^i, ■ ■ ■ , qi, m )- If the matrix type of the family 
(Qi, ■ ■ ■ , Qe) is zero, then all the polynomials are constant, in which case the conclusion 
holds trivially for d — 1. If the matrix type is nonzero, then by Lemma [3.21 there exists 
(qi, ...,qe) G {Qi,. . .,Qe) such that for h x G N, the family (qi, . . . , q e , hi) -vdC(Q x , . . . , Qe 
has type strictly smaller than that of (Qi, . . . , Qe). 

As in the model example, using van der Corput and Cauchy-Schwarz, we have that 

2 d+i 



(5) 



N 

n=l i=l i=l 



N 



qi,m(n) 



)fn 



is bounded by an ojv(l) term plus a constant multiple of 



1 AT N—hi I i 

Jf E a7 E s (» + *0 • • (II (II T ^ 

hi=l n=l i=l i=l 



)#2r 



where (q hl:1J , . . . q hlAj ) G (gi, /ii) -vdC(Qi, ...,Qt) for every ^ G N and j = 
1, . . . , 2m and each function g^ is equal to one of the functions fj. If the new family of 
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polynomials has zero matrix type, we stop. If not, as in the model example, we continue 
to use van der Corput and Cauchy-Schwarz to bound the average over n. By Lemma I3~3"| 
after a finite number of steps, depending only on the maximum degree of the polynomials 
qij and the integers i and m, we have families of polynomials with zero matrix type. 
Assume that this takes d steps. We deduce that the expression dSJ) is bounded by a ojv(l) 
term (using the assumption that a(n)/n c — > for every c > to control the lower order 
terms) plus a constant multiple of 

j ^ N-hi h d 2 

— ^ a(n) • a(n + hi) ■ d(n + h 2 ) ■ . . . ■ a(n + hi H \-h d 



N d 



N 

l<hi,...,h d <N ra=l 



(Note that the last occurrence of a in this expression may actually be a, depending on 
the parity of d.) As in the model example, we see that this last average is bounded by 
a constant (equal to d d ) times 

II l|2 d+1 

completing the proof. □ 

3.4. Comparing averages. The key result needed to compare averages over the primes 
and over the integers is (recall that W = Yl P eVp<wP denotes the product of the primes 
bounded by w): 

Proposition 3.6. Let £, m e N, (X, X, fx) be a probability space, 7\, . . . , Tg : X — » X be 

commuting invertible measure preserving transformations, fi, ■ ■ ■ , f m £ L°°(n) be func- 
tions, and Qij : Z — > Z, i G {!,...,£}, j G {1, . . . , m}, be polynomials. Then the max- 
imum, taken over those r between 1 and W satisfying (r, W) = 1, of the L 2 (/j,)-norm 
of 

1 N I t 

jt E( A U-) - 1) • (U T t AWn+r) )h ■■■■ (U T " lMWn+r) )f- 

n=l i=l i=l 

converges to as N — )■ oo and then w — > oo. 

Proof. We can assume that all functions are bounded by 1. We apply Lemma 13.51 for 
a w,r( n ) = A' wr (n) - 1 for M), r G N, and the family of polynomials qij(Wn + r). Let 
= {r E [l,W]: (r, W) = 1}. We get that there exists d G N, independent of w and 
r, such that 



max 



^ - 1) • {\{T? AWn+r) )h ■■■■■ C[[ T t MWn+r) )f m 



n=l i=l i=l 

max 



L2( M ) 

|(A^, r -l)-l [ljJV] ll^^) +o N (l) 



where the term ojv(1) depends only on the integers d and w. The result now follows from 
Theorem 12.21 □ 

10 



4. Proof of the main results 

4.1. Proof of Theorem II. 2L We use the following uniform multiple recurrence result, 
proved in the same way as Theorem 3.2 is proved in [3J: 

Theorem 4.1. Let (X, X, /j,) be a probability space and T 1; . . . , 7}: X — > X be commuting 
invertible measure preserving transformations. Let : Z — > Z be polynomials with 
qij{0) = for i = 1, . . . , £ and j = 1, . . . , m. Then for any A G X with fi(A) > 0, there 
exists a positive constant c, depending only on fi(A) and the polynomials qij, such that 

1 at t i 

h ^ f jf s>( A n (n^' i(n) ^) n • • • n (n^' m(n) ^)) ^ c - 

n=l i=l i=l 

It is important to note that the constant c does not depend on the transformations 
Hi, . . . , Tg. This observation enables us to prove a uniform multiple recurrence result 
more suitable for our purposes (the uniformity in W is crucial): 

Corollary 4.2. Let (X, X, fi) be a probability space and 7\, . . . , Tf. X — > X be commut- 
ing invertible measure preserving transformations. Let qij : Z — > Z be polynomials with 
Qi,j(®) = f or i = one? j = 1, . . . , m. Then for any A G X with fi(A) > 0, there 

exists a positive constant c, depending on n(A) and the polynomials qij, such that for 
every W G N, we have 

N I i 

limmf 1 J>(A n C[[T^ {Wn) A) n . . . n C[[Tt- iWn) A)) > c. 

n=l i=l i=l 

Proof. We write the proof for £,m = 1, as the general case follows in an analogous 
manner. Let (X,X,fi) be a probability space and let T: X — > X be an invertible 
measure preserving transformation. Let q(n) = c±n + • ■ ■ + Qn d , where ci, . . . , q G Z 
and d G N. Given A e X and G N, we have that 

d 

n(A n T 9(w/n) A) = //(A n (f] sfA)) 

i=l 

where Si = T CiW * for i — 1, . . . , d. The result now follows from Theorem 14.11 □ 
Combining Proposition 13.61 and Corollary 14. 2\ we have that for sufficiently large wGN, 

N i I 

l TJ£ h E ^( An (U T? AWn) A) n . . . n (Q T^ Wn) A)) > o. 

n=l i=l i=l 

By Lemma 12.11 the conclusion of Theorem 11.21 is satisfied for a set of n with positive 
relative density in the shifted primes P — 1 . 

A similar argument holds for the shifted primes P + 1. 

n 



4.2. Proof of Theorem 11.31 To complete the proof, we follow the method used in [TJ. 
By Lemma T2.lt it suffices to prove convergence in L 2 (fi) for the corresponding weighted 
averages 



N £ I 

a w = jj e A, w ■ (n i f ,i(B) )/i en / 



<3i,m(") 



?1=1 j=l j=l 

Equivalently, it suffices to show that the sequence of functions (A(N)) NeN is Cauchy in 
L 2 ( M ). 

Let e > 0. Fix w,r 6 N, and let 



AT I 

B -w = ^E(fI r . 



79i,i(W n+r) 
n=l j=l j=l 



')/: 



qi,m{Wn+r) 



(As before, W denotes the product of primes bounded by tu.) By Proposition 13. 6\ we 
have that for some wo G N (and corresponding Wo G N), if N is large enough, then 



(6) 



A{W " N) ~ jcm £ B - (Ar) 

l<r<Wo,(r,Wb)=l 



< e/6. 



By assumption, for r = l,...,Wo, the sequence (B WQ<r (N))N & ^ converges in L 2 (/i). 
Therefore, if M and iV are sufficiently large, then for r = 1, . . . , Wo we have 

(7) \\B W0>r (N)-B W0>r (M)\\ L2{ii) <e/6. 
Combining ([6]) and (jTJ) we have that if M and iV are sufficiently large, then 

(8) \\A(W N)-A(W M)\\ L2{ti) <e/2. 
Lastly, for r = 1, . . . , W , we have 



(9) 



Km \\A(W N + r) - A(W N)\\ LHu) = 0. 



V->oo " v " v ~ (m) 

Combining (JS]) and ([9]), it follows that if M and iV are sufficiently large, then 

\\A(N)-A(M)\\ L2(fl) <e. 

Therefore, the sequence (A(N))n(z^ is Cauchy in L 2 (fi), completing the proof of Theo- 
rem [T73J 
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